A Two-step Approach to Video Retrieval based on ASR transcriptions
نویسندگان
چکیده
In this paper, we describe our experiments for the Rich Speech Retrieval Task at the MediaEval Benchmark Initiative 2011. We start with a brief overview on the used framework and its structure. Our experiments indicate that a two-step retrieval approach and applying a spell checker can improve the quality of retrieval results in the given scenario. Finally, we discuss other techniques that may further improve the quality of the results.
منابع مشابه
Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term Detection
This paper describes a novel correct phoneme sequence estimation method that uses a recurrent neural network (RNN)-based framework for spoken term detection (STD). In an automatic speech recognition (ASR)-based STD framework, ASR performance (word or subword error rate) affects STD performance. Therefore, it is important to reduce ASR errors to obtain good STD results. In this study, we use an ...
متن کاملVideo Shot Classification Using Lexical Context
Associating concepts to video segments is essential for content-based video retrieval. We present here a semantic classifier working from text transcriptions coming from automatic speech recognition (ASR). The system is based on a Bayesian classifier, it is fully linked with a knowledge base which contains an ontology and named entities from several domains. The system is trained from a set of ...
متن کاملBuilding an ASR System for a Low-research Language Through the Adaptation of a High-resource Language ASR System: Preliminary Results
For many languages in the world, not enough (annotated) speech data is available to train an ASR system. We here propose a new three-step method to build an ASR system for such a low-resource language, and test four measures to improve the system’s success. In the first step, we build a phone recognition system on a high-resource language. In the second step, missing low-resource language acous...
متن کاملGenre tagging of videos based on information retrieval and semantic similarity using WordNet
In this paper we propose a new approach for the genre tagging task of videos, using only their ASR transcripts and associated metadata. This new approach is based on calculating the semantic similarity between the nouns detected in the video transcripts and a bag of nouns generated from WordNet, for each category proposed to classify the videos. Specifically, we have used the Lin measure based ...
متن کاملCombining Word and Phonetic-Code Representations for Spoken Document Retrieval
The traditional approach for spoken document retrieval (SDR) uses an automatic speech recognizer (ASR) in combination with a word-based information retrieval method. This approach has only showed limited accuracy, partially because ASR systems tend to produce transcriptions of spontaneous speech with significant word error rate. In order to overcome such limitation we propose a method which use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011